Back to Article
Article Notebook
Download Source

Classifying drug-induced webs using Topological Data Analysis

Authors
Affiliation

Guilherme Vituri F. Pinto

Unesp

Telmo

??????

Published

February 6, 2026

Abstract

We studied etc etc etc

Keywords

Topological Data Analysis, Persistent homology

1 Setup

1.1 Imports

In [1]:
# === TDAweb Modules ===
using TDAweb
using TDAweb.Preprocessing: load_web, image_to_array, image_to_r2
using TDAweb.TDA: rips_pd, rich_stats, persistence_entropy,
                  vectorize_diagram, pd_distance_matrix, loocv_knn_wasserstein
using TDAweb.Analysis: mds_embedding, classification_report, knn_predict,
                       cross_validate, test_group_differences, stats_to_matrix,
                       cohens_d, pairwise_drug_comparison, feature_importance_permutation,
                       plot_wing_with_pd,
                       # Distance combination
                       combine_distances, loocv_knn_distance, optimize_alpha,
                       # Binary classification
                       binary_classification_control_vs_rest, roc_curve_control,
                       # Separability metrics
                       within_between_ratio, silhouette_by_class, pairwise_group_distances,
                       # PERMANOVA
                       permanova, permanova_control_vs_drugs, drug_equivalence_test,
                       pairwise_drug_permutation_tests, pairwise_confusion_analysis
using TDAweb.Visualization: plot_betti_curves_by_group, plot_distance_heatmap_sorted,
                            plot_3d_scatter, plot_mds_2d, plot_avg_persistence_images,
                            plot_confusion_matrix, plot_feature_importance,
                            plot_feature_boxplot,
                            # New plots
                            plot_alpha_optimization, plot_roc_curve,
                            plot_separability_comparison, plot_silhouette_by_class,
                            plot_binary_classification_summary, plot_pairwise_group_heatmap

# === Data & Stats ===
using DataFrames, DataFramesMeta
using TidierFiles: list_files
using StatsBase: median, mean, zscore, std
using Distances: pairwise, Euclidean
using MultivariateStats
using PersistenceDiagrams: Wasserstein, Bottleneck

# === Plotting ===
import Plots as pt
using Images: mosaicview, imresize
[ Info: Precompiling TDAweb [33e9cfd9-3fd8-44a9-876f-d405e84153eb] (cache misses: include_dependency fsize change (2))



SYSTEM: caught exception of type :MethodError while trying to print a failed Task notice; giving up

WARNING: Imported binding PersistenceDiagrams.Silhouette was undeclared at import time during import to TDA.

The WebIO Jupyter extension was not detected. See the WebIO Jupyter integration documentation for more information.

WARNING: Detected access to binding `WebIO.webio_serve` in a world prior to its definition world.
  Julia 1.12 has introduced more strict world age semantics for global bindings.
  !!! This code may malfunction under Revise.
  !!! This code will error in future versions of Julia.
Hint: Add an appropriate `invokelatest` around the access to this binding.
To make this warning an error, and hence obtain a stack trace, use `julia --depwarn=error`.

2 Dataset Description

2.1 Sample Composition

This analysis uses N=25 spider web images collected from spiders exposed to different agricultural chemicals:

Group N Drug Type Mechanism of Action
CONTROL 5 None Baseline web structure
CIPERMETRINA 5 Insecticide (Pyrethroid) Synthetic pyrethroid; disrupts sodium channels causing tremors and impaired motor control
ENDOSULFAN 5 Insecticide (Organochlorine) GABA antagonist; causes seizures and neurological disruption (banned in many countries)
GLIFOSATO 5 Herbicide (Glyphosate) Glycine analog; disputed neurotoxicity in arthropods
SPINOSAD 5 Insecticide (Organic) Bacterial metabolite; nicotinic acetylcholine agonist causing paralysis
Missing Metadata

The original dataset lacks several critical experimental details:

  • Spider species identification
  • Drug dosages and exposure protocols
  • Spider age, sex, and size
  • Environmental conditions (temperature, humidity, light)
  • Time post-exposure when webs were photographed

This limits biological interpretation and generalizability of findings.

2.2 Biological Context: Why Study Spider Webs?

Spider web construction is a sensitive bioassay for neurotoxicity. Building a web requires:

  1. Precise motor control: Accurate silk placement and anchor point selection
  2. Spatial memory: Following geometric patterns and maintaining symmetry
  3. Proprioception: Body position awareness during construction

Drugs affecting the nervous system disrupt these processes, manifesting as structural changes visible in the web geometry.

2.2.1 Historical Precedent

Spider web pharmacology dates to 1948 (Witt et al.): drugs like caffeine, LSD, and marijuana produce characteristic web deformations. Modern applications include:

  • Environmental toxicology monitoring
  • Pesticide safety assessment
  • Neurological drug screening

2.2.2 Expected Drug Effects

Based on mechanism of action:

  • Pyrethroids (CIPERMETRINA): Sodium channel disruption → tremors → irregular silk placement
  • Organochlorines (ENDOSULFAN): GABA antagonist → seizures → chaotic structures
  • Glyphosate (GLIFOSATO): Herbicide with disputed neurotoxicity → unclear effect expected
  • Spinosad: Nicotinic agonist → paralysis → incomplete or simplified webs

2.3 Why TDA for This Problem?

Traditional image analysis methods (edge detection, Fourier analysis, texture features) struggle with spider webs because:

  1. Geometric irregularity: Webs don’t follow rigid templates
  2. Scale variation: Cell sizes vary across the web
  3. Partial structures: Incomplete or torn webs

TDA Advantages:

  • Topological invariance: Robust to rotation, scaling, and small perturbations
  • Multi-scale analysis: Persistence diagrams capture features at all scales simultaneously
  • Interpretable features: H0 = fragmentation, H1 = loop structure (cells/meshes)
  • No template required: Data-driven rather than model-based
TDA Hypothesis

We hypothesize that drug-induced neurological impairment will manifest as:

  • H1 features (closed loops) decreasing under drugs that impair motor coordination
  • H0 features (fragmentation) increasing if drugs cause severe behavioral disruption
  • Persistence entropy decreasing under drugs that produce irregular cell patterns

Topological features should provide more sensitive detection than simple metrics like web area or thread count.

3 Data Loading

3.1 Load Images

In [2]:
files = list_files("images")
species = map(files) do file
    split(file, "/")[2]
end

imgs = load_web.(files, blur = 2)
As = image_to_array.(imgs)
Xs = image_to_r2.(As, threshold = 0.1);
In [3]:
df = DataFrame(File = files, Specie = species, Image = imgs)
unique_species = unique(species)
println("Groups: ", unique_species)
println("Samples per group:")
for sp in unique_species
    n = count(==(sp), species)
    println("  $sp: $n")
end
Groups: SubString{String}["CIPERMETRINA", "CONTROL", "ENDOSULFAN", "GLIFOSATO", "SPINOSAD"]
Samples per group:
  CIPERMETRINA: 5
  CONTROL: 5
  ENDOSULFAN: 5
  GLIFOSATO: 5
  SPINOSAD: 5

3.2 Sample Size and Statistical Power

Statistical Power Considerations

Dataset Size: N=25 total (5 samples per group)

This sample size has important implications for statistical inference:

3.2.1 Detection Limits

With N=5 per group, we can reliably detect only large effect sizes (Cohen’s |d| > 0.8):

  • Small effects (|d| < 0.5): Very low power (~20-30%)
  • Medium effects (|d| = 0.5-0.8): Moderate power (~40-60%)
  • Large effects (|d| > 0.8): Adequate power (~70-80%)

3.2.2 Statistical Concerns

  1. Wide confidence intervals: Effect size estimates are imprecise
  2. High variance: Cross-validation results show substantial standard deviations
  3. No independent validation: All results use LOOCV on the same 25 samples
  4. Overfitting risk: Classification models may capture sample-specific noise

3.2.3 Study Interpretation

Given these limitations, this analysis should be interpreted as:

Proof-of-concept demonstrating TDA methodology ✓ Exploratory analysis generating hypotheses ✓ Method validation showing feasibility

NOT definitive biological conclusions ✗ NOT generalizable without replication ✗ NOT powered for detecting subtle effects

3.2.4 Recommendations for Future Work

  • Minimal viable study: N ≥ 20 per group
  • Well-powered study: N ≥ 30 per group
  • Independent validation cohort: 70/30 train/test split or multi-site data

Results presented here represent upper bounds on performance (likely optimistic due to overfitting) and should guide future adequately-powered studies.

3.3 Point Cloud Sampling

We extract 1000 points from each web using the farthest point sample algorithm.

In [4]:
samples = map(Xs) do X
    farthest_points_sample(X, 1000)
end;

4 Persistence Diagrams

4.1 Compute Rips Filtration

In [5]:
bcs = map(samples) do s
    rips_pd(s, cutoff = 5)
end;

4.2 Extract H0 and H1

In [6]:
# H0: connected components (web fragmentation)
# H1: loops/cycles (cells in web)
# Ripserer returns tuple of (H0, H1, ...) diagrams

# Safely extract diagrams - handle cases where diagrams might be empty or missing
bcs_0 = [length(bc) >= 1 ? bc[1] : [] for bc in bcs]
bcs_1 = [length(bc) >= 2 ? bc[2] : (length(bc) >= 1 ? bc[end] : []) for bc in bcs]

println("Number of samples: ", length(bcs))
println("H0 diagram example (first non-empty): ")
for (i, bc) in enumerate(bcs_0)
    if !isempty(bc)
        println("  Sample $i: ", length(bc), " features")
        break
    end
end
println("H1 diagram example (first non-empty): ")
for (i, bc) in enumerate(bcs_1)
    if !isempty(bc)
        println("  Sample $i: ", length(bc), " features")
        break
    end
end;
Number of samples: 25
H0 diagram example (first non-empty): 
  Sample 1: 1000 features
H1 diagram example (first non-empty): 
  Sample 1: 71 features

4.3 Representative Web Examples

Before diving into feature extraction and statistics, let’s visualize representative spider webs from each treatment group along with their persistence diagrams. Each panel shows:

  • Top left: Persistence diagram (birth-death plot showing H1 cycles)
  • Top right: Web image intensity heatmap
  • Bottom: Point cloud sample used for TDA computation
In [7]:
# Show one representative web per group (first sample from each)
for sp in unique_species
    idx = findfirst(==(sp), species)
    if !isnothing(idx) && !isempty(bcs_1[idx])
        p = plot_wing_with_pd(bcs_1[idx], As[idx], samples[idx], sp)
        display(p)
    end
end

5 Feature Extraction

Understanding TDA Features

We extract several statistical summaries from each persistence diagram. Here’s what each feature measures and how it relates to web structure:

Feature What it measures Web interpretation
n_features Number of H1 cycles detected Number of closed cells in the web
total_persistence Sum of all cycle lifespans Overall topological complexity
median_persistence Median cycle lifespan Typical cell “robustness” (robust to outliers)
max_persistence Largest cycle lifespan Most prominent hole or cell
entropy Uniformity of cycle lifespans High = regular cells; Low = irregular cells
median_birth Median scale at which cycles appear Typical cell size (robust to outliers)

These features transform complex persistence diagrams into interpretable numbers that can be compared statistically across treatment groups.

5.1 Rich Statistics - H1 (Cycles)

In [8]:
stats_h1 = [rich_stats(bc) for bc in bcs_1];

DataFrame(
    Specie = species,
    n_features = [s.n_features for s in stats_h1],
    total_persistence = round.([s.total_persistence for s in stats_h1], digits=3),
    median_persistence = round.([s.median_persistence for s in stats_h1], digits=3),
    max_persistence = round.([s.max_persistence for s in stats_h1], digits=3),
    entropy = round.([s.entropy for s in stats_h1], digits=3)
)
25×6 DataFrame
Row Specie n_features total_persistence median_persistence max_persistence entropy
SubStrin… Int64 Float64 Float64 Float64 Float64
1 CIPERMETRINA 71 831.438 9.631 66.449 4.079
2 CIPERMETRINA 69 779.229 8.354 37.007 4.076
3 CIPERMETRINA 66 925.464 11.833 58.847 4.033
4 CIPERMETRINA 69 777.478 10.289 28.721 4.138
5 CIPERMETRINA 50 499.91 8.61 22.183 3.811
6 CONTROL 68 473.123 6.0 19.114 4.159
7 CONTROL 81 577.74 6.403 14.111 4.357
8 CONTROL 138 1086.26 6.606 38.022 4.836
9 CONTROL 108 764.702 6.43 16.632 4.643
10 CONTROL 93 626.066 6.275 12.905 4.507
11 ENDOSULFAN 87 648.155 6.298 31.075 4.379
12 ENDOSULFAN 68 543.676 6.945 22.366 4.145
13 ENDOSULFAN 80 812.353 6.842 80.652 4.111
14 ENDOSULFAN 59 610.397 6.896 58.875 3.785
15 ENDOSULFAN 61 616.653 8.022 55.703 3.941
16 GLIFOSATO 60 965.556 10.522 107.758 3.752
17 GLIFOSATO 43 695.187 10.05 60.592 3.474
18 GLIFOSATO 72 868.896 8.868 79.585 4.049
19 GLIFOSATO 54 695.542 9.893 68.599 3.765
20 GLIFOSATO 67 866.073 9.56 51.756 4.025
21 SPINOSAD 70 793.129 9.768 37.691 4.093
22 SPINOSAD 69 827.155 10.814 53.565 4.099
23 SPINOSAD 72 780.85 8.964 38.603 4.137
24 SPINOSAD 64 747.34 9.938 27.148 4.04
25 SPINOSAD 78 831.255 9.219 57.754 4.216

5.2 H0 (Connected Components) - Not Analyzed

Why H0 is Not Analyzed

All spider webs in this dataset remain structurally connected (single component), meaning H0 persistence provides minimal discriminatory information between treatment groups.

The absence of web fragmentation suggests:

  • Spiders complete web construction despite drug exposure
  • Drug effects manifest primarily as topological changes within connected structures (H1 features)
  • Changes in loop/cell patterns (H1) rather than complete structural breakdown

Therefore, we focus our analysis on H1 (one-dimensional persistence) which captures the relevant differences in cell structure and regularity.

In [9]:
# Compute H0 stats for completeness (not shown in main analysis)
stats_h0 = [rich_stats(bc) for bc in bcs_0];

5.3 Feature Matrices

In [10]:
# H1 rich stats matrix
X_h1, feature_names_h1 = stats_to_matrix(stats_h1)

# H0 rich stats matrix
X_h0, feature_names_h0 = stats_to_matrix(stats_h0)

# Safe z-score normalization (handles constant columns)
function safe_zscore(x)
    s = std(x)
    s == 0 ? zeros(length(x)) : zscore(x)
end

X_h1_normalized = mapslices(safe_zscore, X_h1, dims=1)
X_h0_normalized = mapslices(safe_zscore, X_h0, dims=1)

# Replace any NaN with 0
X_h1_normalized[isnan.(X_h1_normalized)] .= 0.0
X_h0_normalized[isnan.(X_h0_normalized)] .= 0.0

println("H1 features: ", feature_names_h1)
println("Feature matrix size: $(size(X_h1))")
H1 features: [:n_features, :total_persistence, :median_persistence, :std_persistence, :max_persistence, :q25, :q50, :q75, :q90, :entropy, :median_birth, :birth_range]
Feature matrix size: (25, 12)

5.4 Vectorized Diagram Features

In [11]:
# Full vectorization (persistence images, Betti curves, landscapes, etc.)
feature_vectors = [vectorize_diagram(bc) for bc in bcs_1]
X_features = reduce(hcat, feature_vectors)'

println("Vectorized features dimension: $(size(X_features, 2))")
Vectorized features dimension: 362

6 Exploratory Visualization

6.1 Summary Statistics by Drug

In [12]:
println("Mean Statistics by Group:\n")
for sp in unique_species
    idx = findall(==(sp), species)
    if isempty(idx)
        println("$sp: no samples\n")
        continue
    end
    group_stats = stats_h1[idx]

    entropies = [s.entropy for s in group_stats]
    n_feats = [s.n_features for s in group_stats]
    max_pers = [s.max_persistence for s in group_stats]

    mean_entropy = isempty(entropies) ? 0.0 : mean(entropies)
    mean_n_features = isempty(n_feats) ? 0.0 : mean(n_feats)
    mean_max_pers = isempty(max_pers) ? 0.0 : mean(max_pers)

    println("$sp:")
    println("  - Mean cycles (H1): $(round(mean_n_features, digits=1))")
    println("  - Mean entropy: $(round(mean_entropy, digits=3))")
    println("  - Mean max persistence: $(round(mean_max_pers, digits=3))")
    println()
end
Mean Statistics by Group:

CIPERMETRINA:
  - Mean cycles (H1): 65.0
  - Mean entropy: 4.027
  - Mean max persistence: 42.642

CONTROL:
  - Mean cycles (H1): 97.6
  - Mean entropy: 4.5
  - Mean max persistence: 20.157

ENDOSULFAN:
  - Mean cycles (H1): 71.0
  - Mean entropy: 4.072
  - Mean max persistence: 49.734

GLIFOSATO:
  - Mean cycles (H1): 59.2
  - Mean entropy: 3.813
  - Mean max persistence: 73.658

SPINOSAD:
  - Mean cycles (H1): 70.6
  - Mean entropy: 4.117
  - Mean max persistence: 42.952

6.2 Betti Curves by Drug

In [13]:
plot_betti_curves_by_group(bcs_1, species)

6.3 Average Persistence Images

In [14]:
plot_avg_persistence_images(bcs_1, species; size_pi=8, sigma=0.15)

6.4 Within-Group Variability

Some drug groups show more heterogeneity in web structure than others. Below we show the most and least complex webs (by entropy) within each group to illustrate this variability. High entropy indicates regular, uniform cell sizes; low entropy indicates irregular cells.

In [15]:
for sp in unique_species
    idx = findall(==(sp), species)
    if length(idx) >= 2
        group_entropies = [stats_h1[i].entropy for i in idx]
        sorted_idx = idx[sortperm(group_entropies)]

        low_i = sorted_idx[1]
        high_i = sorted_idx[end]

        p_low = plot_wing_with_pd(bcs_1[low_i], As[low_i], samples[low_i],
                                   "$sp - Low entropy (irregular)")
        p_high = plot_wing_with_pd(bcs_1[high_i], As[high_i], samples[high_i],
                                    "$sp - High entropy (regular)")

        combined = pt.plot(p_low, p_high, layout=(1, 2), size=(1200, 400))
        display(combined)
    end
end

6.5 Feature Distributions by Group

The boxplots below show how key TDA features are distributed across treatment groups. This helps visualize group differences before formal statistical testing.

In [16]:
p1 = plot_feature_boxplot([s.entropy for s in stats_h1], species, "Entropy (H1)")
p2 = plot_feature_boxplot([s.n_features for s in stats_h1], species, "Number of Cycles (H1)")
p3 = plot_feature_boxplot([s.max_persistence for s in stats_h1], species, "Max Persistence (H1)")
p4 = plot_feature_boxplot([s.total_persistence for s in stats_h1], species, "Total Persistence (H1)")

pt.plot(p1, p2, p3, p4, layout=(2, 2), size=(900, 700))
Figure 1: Distribution of key TDA features across treatment groups

7 Distance Analysis

The Wasserstein distance (also called Earth Mover’s Distance) measures how different two persistence diagrams are.

Intuition: Imagine each point in a persistence diagram as a pile of dirt. The Wasserstein distance is the minimum “work” needed to transform one diagram into another by moving dirt around.

Why use it for TDA?

  • Specifically designed for comparing persistence diagrams
  • Captures both the locations of topological features and how they should be matched
  • Has metric properties, enabling use with standard machine learning methods (like KNN)

Notation: Wasserstein(p, q) — we use p=1 (sum of movements) and q=2 (Euclidean ground metric).

A small Wasserstein distance means two webs have similar topological structure; a large distance means their persistence diagrams differ substantially.

MDS converts a distance matrix into low-dimensional coordinates for visualization:

  1. Start with pairwise distances between all samples
  2. Find 2D or 3D coordinates that preserve these distances as well as possible
  3. Plot the coordinates — samples close together have similar features

How to interpret MDS plots:

  • Clusters = groups of samples with similar topological features
  • Separation between clusters = distinct TDA signatures between groups
  • Overlap between groups = ambiguity; these groups are hard to distinguish topologically

MDS is purely for visualization — it doesn’t make statistical claims, but helps us see patterns before formal testing.

7.1 Wasserstein Distance Matrix

In [17]:
D_wasserstein = pd_distance_matrix(bcs_1; metric=Wasserstein(1, 2))
25×25 Matrix{Float64}:
   0.0    125.421  149.505   149.996  …  161.391  207.911  158.323  263.38
 125.421    0.0    153.59    135.848     157.259  215.999  133.712  267.336
 149.505  153.59     0.0     182.613     151.695  251.085  160.943  336.517
 149.996  135.848  182.613     0.0       105.588  144.571  101.393  181.717
 266.832  230.788  308.53    212.951     244.25   213.088  184.152  254.225
 299.746  276.566  391.579   242.585  …  306.842  278.482  255.82   263.543
 315.641  293.247  406.906   282.731     338.736  337.27   299.099  289.382
 600.395  578.767  652.597   637.794     641.536  694.576  651.218  747.219
 416.68   389.453  509.701   398.214     456.635  462.956  411.355  407.379
 385.303  360.164  482.388   347.347     412.4    399.277  363.692  338.669
 309.979  283.324  393.786   278.028  …  320.495  319.393  302.347  303.891
 244.195  225.081  334.331   179.349     241.556  202.346  194.103  214.974
 159.134  194.802  257.867   252.589     249.588  264.4    267.983  318.443
 246.778  232.008  299.076   274.348     253.482  255.08   254.396  379.716
 173.598  196.275  247.833   213.122     197.099  210.929  185.743  274.036
 282.592  300.674  272.063   344.175  …  282.728  327.157  303.107  433.23
 397.027  351.363  382.757   400.73      370.031  361.843  377.011  451.677
 128.08   172.289  184.227   192.1       154.853  187.414  197.367  299.317
 325.133  307.064  329.953   287.327     271.733  231.725  283.73   356.261
 141.482  133.883   96.7042  178.781     134.393  235.819  150.86   327.573
 144.326  106.155  184.168   105.203  …  137.01   171.337  131.314  215.534
 161.391  157.259  151.695   105.588       0.0    163.167  127.829  253.459
 207.911  215.999  251.085   144.571     163.167    0.0    165.233  211.368
 158.323  133.712  160.943   101.393     127.829  165.233    0.0    211.254
 263.38   267.336  336.517   181.717     253.459  211.368  211.254    0.0
In [18]:
plot_distance_heatmap_sorted(D_wasserstein, species; title="Wasserstein Distance (H1)")

7.2 Distance Metric Comparison: Wasserstein vs Bottleneck

Different distance metrics capture different aspects of topological dissimilarity. We compare two fundamental persistence diagram distances:

Wasserstein vs Bottleneck: Theoretical Differences
Property Wasserstein W₁ Bottleneck d∞
Definition Optimal matching cost (total transport) Worst-case matching cost (max single distance)
Formula Sum of all point distances in optimal matching Maximum single point distance in optimal matching
Sensitivity Sensitive to all points (global measure) Dominated by outliers (local measure)
Stability More stable in presence of noise Can be unstable with outliers
Interpretation “Average structural difference” “Maximum local difference”
Computation O(n³) via Hungarian algorithm O(n^2.5) via min-cost flow

When to use which? - Wasserstein: When all topological features matter; captures overall structural difference - Bottleneck: When largest discrepancy matters; robust to small noise but sensitive to big changes

7.2.1 Compute Both Distance Matrices

In [19]:
# Wasserstein distance (already computed)
println("Computing Wasserstein distance matrix...")
D_wasserstein = pd_distance_matrix(bcs_1; metric=Wasserstein(1, 2))

# Bottleneck distance
println("Computing Bottleneck distance matrix...")
D_bottleneck = pd_distance_matrix(bcs_1; metric=Bottleneck())

println("\nDistance matrix statistics:")
println("Wasserstein - Min: $(round(minimum(D_wasserstein[D_wasserstein .> 0]), digits=3)), Max: $(round(maximum(D_wasserstein), digits=3)), Mean: $(round(mean(D_wasserstein), digits=3))")
println("Bottleneck  - Min: $(round(minimum(D_bottleneck[D_bottleneck .> 0]), digits=3)), Max: $(round(maximum(D_bottleneck), digits=3)), Mean: $(round(mean(D_bottleneck), digits=3))")
Computing Wasserstein distance matrix...
Computing Bottleneck distance matrix...

Distance matrix statistics:
Wasserstein - Min: 87.923, Max: 799.563, Mean: 296.694
Bottleneck  - Min: 3.207, Max: 53.879, Mean: 24.31

7.2.2 Compare Distance Distributions

In [20]:
# Scatter plot: Do metrics agree on which samples are similar?
using Plots

# Extract upper triangle (avoid diagonal and duplicates)
n = size(D_wasserstein, 1)
wass_dists = Float64[]
bott_dists = Float64[]
for i in 1:n
    for j in (i+1):n
        push!(wass_dists, D_wasserstein[i,j])
        push!(bott_dists, D_bottleneck[i,j])
    end
end

# Compute correlation
dist_correlation = cor(wass_dists, bott_dists)
println("\nCorrelation between distance metrics: $(round(dist_correlation, digits=3))")

# Visualization
p_scatter = pt.scatter(
    wass_dists, bott_dists,
    xlabel="Wasserstein Distance",
    ylabel="Bottleneck Distance",
    title="Distance Metric Comparison (r = $(round(dist_correlation, digits=2)))",
    alpha=0.5,
    legend=false,
    markersize=4
)

# Add diagonal reference line
max_dist = max(maximum(wass_dists), maximum(bott_dists))
pt.plot!(p_scatter, [0, max_dist], [0, max_dist], linestyle=:dash, color=:red, linewidth=2)

p_scatter

Correlation between distance metrics: 0.209

7.2.3 Classification Performance Comparison

In [21]:
# Test classification accuracy with both metrics
println("\n=== Classification Accuracy Comparison ===")

acc_wass = loocv_knn_distance(D_wasserstein, species; k=3)
acc_bott = loocv_knn_distance(D_bottleneck, species; k=3)

println("Wasserstein W₁: $(round(acc_wass * 100, digits=1))%")
println("Bottleneck d∞:  $(round(acc_bott * 100, digits=1))%")
println()

if abs(acc_wass - acc_bott) < 0.03
    println("⇒ Metrics perform SIMILARLY")
    println("  Choice doesn't significantly impact classification")
elseif acc_wass > acc_bott
    println("⇒ Wasserstein OUTPERFORMS Bottleneck")
    println("  Global structure more informative than local extrema")
else
    println("⇒ Bottleneck OUTPERFORMS Wasserstein")
    println("  Largest differences most discriminative")
end

=== Classification Accuracy Comparison ===
Wasserstein W₁: 44.0%
Bottleneck d∞:  20.0%

⇒ Wasserstein OUTPERFORMS Bottleneck
  Global structure more informative than local extrema

7.2.4 Visualize Distance Matrices

In [22]:
plot_distance_heatmap_sorted(D_wasserstein, species; title="Wasserstein Distance (H1)")
In [23]:
plot_distance_heatmap_sorted(D_bottleneck, species; title="Bottleneck Distance (H1)")

7.2.5 Interpretation

In [24]:
println("\n=== Distance Metric Analysis ===\n")

if dist_correlation > 0.8
    println("✓ High correlation (r = $(round(dist_correlation, digits=2)))")
    println("  Metrics largely agree on sample relationships")
    println("  Either metric is appropriate for this dataset")
elseif dist_correlation > 0.5
    println("≈ Moderate correlation (r = $(round(dist_correlation, digits=2)))")
    println("  Metrics capture somewhat different aspects")
    println("  Consider using both for complementary information")
else
    println("✗ Low correlation (r = $(round(dist_correlation, digits=2)))")
    println("  Metrics capture fundamentally different structures")
    println("  Choice significantly impacts conclusions")
end

println("\nRecommendation:")
if acc_wass >= acc_bott
    println("→ Use Wasserstein distance for this dataset")
    println("  Better classification performance")
    println("  Captures overall structural differences relevant to drug effects")
else
    println("→ Use Bottleneck distance for this dataset")
    println("  Better classification performance")
    println("  Largest topological differences most informative")
end

=== Distance Metric Analysis ===

✗ Low correlation (r = 0.21)
  Metrics capture fundamentally different structures
  Choice significantly impacts conclusions

Recommendation:
→ Use Wasserstein distance for this dataset
  Better classification performance
  Captures overall structural differences relevant to drug effects

7.3 Euclidean Distance on Rich Stats

In [25]:
D_rich_stats = pairwise(Euclidean(), X_h1_normalized, dims=1)
plot_distance_heatmap_sorted(D_rich_stats, species; title="Euclidean Distance (Rich Stats)")

7.4 MDS Embeddings

In [26]:
# 2D MDS - Wasserstein
mds_wass_2d = mds_embedding(D_wasserstein; maxoutdim=2)
plot_mds_2d(mds_wass_2d.embedding, species; title="MDS 2D (Wasserstein)")
In [27]:
# 2D MDS - Rich Stats
mds_rich_2d = mds_embedding(D_rich_stats; maxoutdim=2)
plot_mds_2d(mds_rich_2d.embedding, species; title="MDS 2D (Rich Stats)")
In [28]:
# 3D MDS - Wasserstein (disabled: PlotlyJS output not compatible with Quarto)
mds_wass_3d = mds_embedding(D_wasserstein; maxoutdim=3)
plot_3d_scatter(mds_wass_3d.embedding, species; title="MDS 3D (Wasserstein H1)")

8 Statistical Tests

Our statistical analysis follows a three-stage approach:

  1. Omnibus test (Kruskal-Wallis): Do ANY groups differ from each other?
  2. Pairwise comparisons (Permutation tests): WHICH drugs differ from control?
  3. Effect sizes (Cohen’s d): HOW MUCH do they differ?

This hierarchical approach controls false positives while providing interpretable effect magnitudes.

The Kruskal-Wallis test is a non-parametric alternative to one-way ANOVA. We use it here because:

  1. No normality assumption: Unlike ANOVA, it doesn’t require the data to follow a normal distribution — important for TDA features which may have unusual distributions
  2. Robust to outliers: Uses ranks instead of raw values, so extreme points don’t dominate
  3. Works with small samples: Reliable even with limited data per group

How to interpret the p-value:

  • p < 0.05: Strong evidence that at least one group differs from the others (marked with *)
  • p ≥ 0.05: Insufficient evidence to conclude groups differ

Why not use ANOVA? With small sample sizes and potentially non-normal distributions (common in TDA features), Kruskal-Wallis is more reliable and makes fewer assumptions.

8.1 Kruskal-Wallis Tests

In [29]:
println("Kruskal-Wallis Tests for Group Differences:\n")

# Test key features
features_to_test = [
    ("entropy", [s.entropy for s in stats_h1]),
    ("n_features", [s.n_features for s in stats_h1]),
    ("max_persistence", [s.max_persistence for s in stats_h1]),
    ("total_persistence", [s.total_persistence for s in stats_h1]),
]

for (name, values) in features_to_test
    kw = test_group_differences(values, species)
    sig = kw.p_value < 0.05 ? "*" : ""
    println("$name: p = $(round(kw.p_value, digits=4)) $sig")
end
Kruskal-Wallis Tests for Group Differences:

entropy: p = 0.004 *
n_features: p = 0.0495 *
max_persistence: p = 0.0123 *
total_persistence: p = 0.2011 

A p-value tells you if groups differ statistically, but effect size tells you how much they differ in practical terms.

Cohen’s d measures the standardized difference between two group means:

\[d = \frac{\bar{x}_1 - \bar{x}_2}{s_{pooled}}\]

where \(s_{pooled}\) is the pooled standard deviation of both groups.

Interpretation guidelines:

d value
< 0.2 Negligible Groups nearly identical
0.2 – 0.5 Small Detectable but minor difference
0.5 – 0.8 Medium Noticeable practical difference
> 0.8 Large Substantial, meaningful difference

Why effect size matters: With large samples, even tiny differences can be “statistically significant” (p < 0.05) but practically meaningless. Effect size helps distinguish meaningful differences from trivial ones.

A permutation test is a non-parametric method to compute p-values without assuming any particular distribution:

  1. Calculate the observed difference between groups (e.g., difference in mean entropy)
  2. Randomly shuffle group labels many times (e.g., 10,000 permutations)
  3. Recalculate the difference after each shuffle
  4. Count how often the shuffled difference exceeds the observed difference
  5. p-value = (count + 1) / (n_permutations + 1)

Advantages:

  • No distributional assumptions — works for any data
  • Works with any test statistic
  • Provides exact p-values even for small samples
  • Intuitive interpretation: “how often would we see this difference by chance?”

Used here: We compare each drug group to CONTROL using permutation tests to get reliable p-values.

A Note on Multiple Comparisons

When we test multiple features across multiple drug groups, we increase the chance of false positives. With 4 drugs × 3 features = 12 tests at α = 0.05, we expect about 0.6 false positives by chance alone.

Recommendations for interpreting results:

  • Focus on results with p < 0.01 (more stringent threshold)
  • Prioritize findings with large effect sizes (|d| > 0.8)
  • Look for consistent patterns across related features (e.g., both entropy and n_cycles showing similar direction)

Results that meet multiple criteria (low p-value AND large effect size AND consistent pattern) are most reliable.

8.2 Pairwise Drug Comparisons with Effect Sizes

In [30]:
# Compare each drug to CONTROL with effect sizes
comparisons = vcat([
    pairwise_drug_comparison([s.entropy for s in stats_h1], species; feature_name="entropy"),
    pairwise_drug_comparison([s.n_features for s in stats_h1], species; feature_name="n_cycles"),
    pairwise_drug_comparison([s.max_persistence for s in stats_h1], species; feature_name="max_persistence"),
]...)

# Show results
select(comparisons, :drug, :feature, :diff_percent => (x -> round.(x, digits=1)) => :diff_pct,
       :cohens_d => (x -> round.(x, digits=2)) => :cohens_d, :effect_size, :p_value)
12×6 DataFrame
Row drug feature diff_pct cohens_d effect_size p_value
SubStrin… String Float64 Float64 String Float64
1 CIPERMETRINA entropy -10.5 -2.31 large 0.0077
2 ENDOSULFAN entropy -9.5 -1.76 large 0.0304
3 GLIFOSATO entropy -15.3 -2.77 large 0.0029
4 SPINOSAD entropy -8.5 -2.02 large 0.0158
5 CIPERMETRINA n_cycles -33.4 -1.63 large 0.0294
6 ENDOSULFAN n_cycles -27.3 -1.27 large 0.0616
7 GLIFOSATO n_cycles -39.3 -1.86 large 0.0174
8 SPINOSAD n_cycles -27.7 -1.39 large 0.0462
9 CIPERMETRINA max_persistence 111.5 1.46 large 0.0613
10 ENDOSULFAN max_persistence 146.7 1.64 large 0.0382
11 GLIFOSATO max_persistence 265.4 3.16 large 0.0028
12 SPINOSAD max_persistence 113.1 1.99 large 0.0202
In [31]:
# Bonferroni correction for multiple comparisons
n_tests = nrow(comparisons)
bonf_alpha = 0.05 / n_tests
println("Multiple comparison correction:")
println("  Number of tests: $n_tests")
println("  Bonferroni-corrected α = $(round(bonf_alpha, digits=4))")
println("  Comparisons significant after correction: ", sum(comparisons.p_value .< bonf_alpha))
Multiple comparison correction:
  Number of tests: 12
  Bonferroni-corrected α = 0.0042
  Comparisons significant after correction: 2

9 Classification

Beyond hypothesis testing, we can ask: can we automatically identify which drug a spider was exposed to based on its web’s topological features? This is a classification task.

KNN is one of the simplest classification algorithms. To classify a new sample:

  1. Compute the distance from the new sample to all training samples
  2. Find the k nearest neighbors (k closest training samples)
  3. Assign the majority class among those neighbors

Key parameter: k (number of neighbors)

  • Small k (e.g., k=1 or k=3): More sensitive to local patterns, but also to noise
  • Large k (e.g., k=10+): More robust, but may miss subtle differences

We use k=3 as a balanced choice that captures local structure without being overly sensitive to outliers.

Distance metric matters: We test both Wasserstein distance (comparing persistence diagrams directly) and Euclidean distance (comparing extracted features).

The problem: If we train and test on the same data, we get overly optimistic accuracy because the model has “seen” the answers. We need to estimate performance on unseen data.

9.0.1 Leave-One-Out Cross-Validation (LOOCV)

  1. Remove one sample from the dataset
  2. Train the model on the remaining n-1 samples
  3. Predict the class of the held-out sample
  4. Repeat for every sample
  5. Accuracy = proportion of correct predictions

Pros: Uses maximum training data; deterministic (same result every time) Cons: Computationally expensive; can have high variance

9.0.2 K-Fold Cross-Validation

  1. Split data into k equal folds (e.g., k=5)
  2. For each fold: train on k-1 folds, test on the remaining fold
  3. Average accuracy across all folds

Pros: Good balance of bias and variance; faster than LOOCV Cons: Results vary slightly depending on random split (we report mean ± std)

9.0.3 Interpreting Results

  • LOOCV accuracy: Single number, deterministic
  • K-fold accuracy: Reported as mean ± standard deviation
  • Higher accuracy = better classification; >50% for 5 classes means better than random guessing (20%)

Beyond overall accuracy, we report per-class metrics:

Metric What it measures Formula
Precision Of samples predicted as class X, how many are truly X? TP / (TP + FP)
Recall Of samples truly in class X, how many did we identify? TP / (TP + FN)
F1 Score Harmonic mean of precision and recall 2 × (P × R) / (P + R)

Interpreting the confusion matrix:

  • Diagonal elements: Correct predictions (true positives for each class)
  • Off-diagonal elements: Errors — reading row i, column j means “sample truly in class i was predicted as class j”
  • A perfect classifier has all counts on the diagonal

9.1 KNN with Wasserstein Distance (LOOCV)

In [32]:
result_knn_wass = loocv_knn_wasserstein(bcs_1, species; k=3)
println("Accuracy (KNN Wasserstein k=3): $(round(result_knn_wass.accuracy * 100, digits=1))%")

report_wass = classification_report(species, result_knn_wass.predictions)
println("\nPer-class metrics:")
for label in report_wass["labels"]
    m = report_wass[string(label)]
    println("  $label: precision=$(round(m.precision, digits=2)), recall=$(round(m.recall, digits=2)), f1=$(round(m.f1, digits=2))")
end
Accuracy (KNN Wasserstein k=3): 44.0%

Per-class metrics:
  CIPERMETRINA: precision=0.0, recall=0.0, f1=0.0
  CONTROL: precision=0.67, recall=0.8, f1=0.73
  ENDOSULFAN: precision=0.33, recall=0.2, f1=0.25
  GLIFOSATO: precision=0.67, recall=0.4, f1=0.5
  SPINOSAD: precision=0.44, recall=0.8, f1=0.57
In [33]:
plot_confusion_matrix(species, result_knn_wass.predictions; title="KNN Wasserstein (k=3)")

9.2 KNN on Vectorized Features (5-fold CV)

In [34]:
knn_clf = (Xt, yt, Xtest) -> knn_predict(Xt, yt, Xtest; k=3)
cv_features = cross_validate(knn_clf, X_features, species; k=5)
println("Accuracy (KNN vectorized, 5-fold): $(round(cv_features.mean * 100, digits=1))% +/- $(round(cv_features.std * 100, digits=1))%")
Accuracy (KNN vectorized, 5-fold): 28.0% +/- 22.8%

9.2.1 Confusion Matrix: Vectorized Features (LOOCV)

To understand which classes are confused, we run LOOCV to get predictions:

In [35]:
# LOOCV to get predictions for confusion matrix
n_samples = size(X_features, 1)
predictions_features = similar(species)

for i in 1:n_samples
    train_idx = setdiff(1:n_samples, i)
    test_idx = i

    pred = knn_predict(X_features[train_idx, :], species[train_idx],
                       X_features[test_idx:test_idx, :]; k=3)
    predictions_features[i] = pred[1]
end

acc_features_loocv = sum(predictions_features .== species) / n_samples
println("LOOCV Accuracy: $(round(acc_features_loocv * 100, digits=1))%")
LOOCV Accuracy: 24.0%
In [36]:
plot_confusion_matrix(species, predictions_features; title="KNN Vectorized Features (k=3)")
In [37]:
# Show per-class metrics
report_features = classification_report(species, predictions_features)
println("\nPer-class metrics (Vectorized Features):")
for label in report_features["labels"]
    m = report_features[string(label)]
    println("  $label: precision=$(round(m.precision, digits=2)), recall=$(round(m.recall, digits=2)), f1=$(round(m.f1, digits=2))")
end

Per-class metrics (Vectorized Features):
  CIPERMETRINA: precision=0.0, recall=0.0, f1=0.0
  CONTROL: precision=0.0, recall=0.0, f1=0.0
  ENDOSULFAN: precision=0.6, recall=0.6, f1=0.6
  GLIFOSATO: precision=0.33, recall=0.2, f1=0.25
  SPINOSAD: precision=0.2, recall=0.4, f1=0.27

9.3 KNN on Rich Stats (5-fold CV)

In [38]:
cv_rich = cross_validate(knn_clf, X_h1_normalized, species; k=5)
println("Accuracy (KNN rich stats, 5-fold): $(round(cv_rich.mean * 100, digits=1))% +/- $(round(cv_rich.std * 100, digits=1))%")
Accuracy (KNN rich stats, 5-fold): 40.0% +/- 14.1%

9.3.1 Confusion Matrix: Rich Stats (LOOCV)

In [39]:
# LOOCV to get predictions for confusion matrix
predictions_rich = similar(species)

for i in 1:n_samples
    train_idx = setdiff(1:n_samples, i)
    test_idx = i

    pred = knn_predict(X_h1_normalized[train_idx, :], species[train_idx],
                       X_h1_normalized[test_idx:test_idx, :]; k=3)
    predictions_rich[i] = pred[1]
end

acc_rich_loocv = sum(predictions_rich .== species) / n_samples
println("LOOCV Accuracy: $(round(acc_rich_loocv * 100, digits=1))%")
LOOCV Accuracy: 36.0%
In [40]:
plot_confusion_matrix(species, predictions_rich; title="KNN Rich Stats H1 (k=3)")
In [41]:
# Show per-class metrics
report_rich = classification_report(species, predictions_rich)
println("\nPer-class metrics (Rich Stats):")
for label in report_rich["labels"]
    m = report_rich[string(label)]
    println("  $label: precision=$(round(m.precision, digits=2)), recall=$(round(m.recall, digits=2)), f1=$(round(m.f1, digits=2))")
end

Per-class metrics (Rich Stats):
  CIPERMETRINA: precision=0.0, recall=0.0, f1=0.0
  CONTROL: precision=0.67, recall=0.8, f1=0.73
  ENDOSULFAN: precision=0.5, recall=0.4, f1=0.44
  GLIFOSATO: precision=0.67, recall=0.4, f1=0.5
  SPINOSAD: precision=0.17, recall=0.2, f1=0.18

9.4 Classification Comparison

In [42]:
println("=== Classification Methods Comparison ===\n")
println("1. KNN Wasserstein (k=3, LOOCV):        $(round(result_knn_wass.accuracy * 100, digits=1))%")
println("2. KNN Vectorized Features (k=3, 5-fold): $(round(cv_features.mean * 100, digits=1))% +/- $(round(cv_features.std * 100, digits=1))%")
println("3. KNN Rich Stats (k=3, 5-fold):         $(round(cv_rich.mean * 100, digits=1))% +/- $(round(cv_rich.std * 100, digits=1))%")
=== Classification Methods Comparison ===

1. KNN Wasserstein (k=3, LOOCV):        44.0%
2. KNN Vectorized Features (k=3, 5-fold): 28.0% +/- 22.8%
3. KNN Rich Stats (k=3, 5-fold):         40.0% +/- 14.1%

10 Method Comparison: TDA vs Traditional Approaches

To validate that TDA provides unique value beyond mathematical sophistication, we compare against traditional image analysis methods.

10.1 Why Compare Methods?

This comparison answers a critical question for expert reviewers: “Why use TDA when simpler methods might work?”

We test three alternative approaches: 1. PCA on raw pixels: Dimensionality reduction on flattened images 2. Handcrafted features: Domain-informed image statistics 3. TDA methods: Our topological approach (for reference)

What This Comparison Reveals
  • If TDA outperforms: Topological structure captures information that pixel-level methods miss
  • If alternatives perform similarly: TDA may be unnecessarily complex for this problem
  • If PCA dominates: Raw pixel patterns sufficient; topology adds little value

This provides empirical justification (or refutation) of the TDA methodology choice.

10.2 Alternative Feature Extraction

10.2.1 Method 1: PCA on Raw Pixels

In [43]:
using LinearAlgebra

# Flatten images to feature vectors (each image becomes a long vector)
println("Extracting raw pixel features...")
X_pixels_list = []
for img in As
    # Resize to consistent size and flatten
    img_resized = imresize(img, (100, 100))  # Reduce to 100x100 for computational efficiency
    push!(X_pixels_list, vec(Float64.(img_resized)))
end
X_pixels = reduce(hcat, X_pixels_list)'

println("Raw pixel matrix size: $(size(X_pixels))")
println("  ($(size(X_pixels, 1)) samples × $(size(X_pixels, 2)) pixels)")

# Apply PCA for dimensionality reduction
println("\nApplying PCA...")
pca_model = fit(PCA, X_pixels', maxoutdim=20)
X_pixels_pca = predict(pca_model, X_pixels')'

variance_explained = principalratio(pca_model)
println("PCA with 20 components:")
println("  Variance explained: $(round(variance_explained * 100, digits=1))%")
println("  Reduced dimensions: $(size(X_pixels_pca, 2))")
Extracting raw pixel features...
Raw pixel matrix size: (25, 10000)
  (25 samples × 10000 pixels)

Applying PCA...
PCA with 20 components:
  Variance explained: 90.0%
  Reduced dimensions: 20

10.2.2 Method 2: Handcrafted Image Features

In [44]:
using ImageFiltering

function handcrafted_features(img::Matrix)
    # Basic intensity statistics
    mean_intensity = mean(img)
    std_intensity = std(img)
    max_intensity = maximum(img)

    # Edge density (approximate with gradient magnitude)
    grad_y = diff(img, dims=1)
    grad_x = diff(img, dims=2)
    # Pad to match original size
    grad_y = vcat(grad_y, zeros(1, size(img, 2)))
    grad_x = hcat(grad_x, zeros(size(img, 1), 1))
    edge_strength = mean(sqrt.(grad_y.^2 .+ grad_x.^2))

    # Spatial distribution (center of mass)
    h, w = size(img)
    binary_img = img .> 0.1
    if sum(binary_img) > 0
        coords_y = [i for i in 1:h, j in 1:w if binary_img[i,j]]
        coords_x = [j for i in 1:h, j in 1:w if binary_img[i,j]]
        center_y = mean(coords_y)
        center_x = mean(coords_x)
        center_dist = sqrt((center_y - h/2)^2 + (center_x - w/2)^2)

        # Spread (standard deviation of positions)
        spread_y = std(coords_y)
        spread_x = std(coords_x)
    else
        center_dist = 0.0
        spread_y = 0.0
        spread_x = 0.0
    end

    # Density (proportion of non-zero pixels)
    density = mean(img .> 0.1)

    [mean_intensity, std_intensity, max_intensity, edge_strength,
     center_dist, spread_y, spread_x, density]
end

println("Extracting handcrafted features...")
X_handcrafted_list = [handcrafted_features(img) for img in As]
X_handcrafted = reduce(hcat, X_handcrafted_list)'

feature_names_handcrafted = ["mean_intensity", "std_intensity", "max_intensity",
                              "edge_strength", "center_dist", "spread_y", "spread_x", "density"]

println("Handcrafted feature matrix size: $(size(X_handcrafted))")
println("  Features: ", feature_names_handcrafted)

# Normalize handcrafted features
X_handcrafted_norm = mapslices(safe_zscore, X_handcrafted, dims=1)
X_handcrafted_norm[isnan.(X_handcrafted_norm)] .= 0.0
Extracting handcrafted features...
Handcrafted feature matrix size: (25, 8)
  Features: ["mean_intensity", "std_intensity", "max_intensity", "edge_strength", "center_dist", "spread_y", "spread_x", "density"]
0-element view(::Vector{Float64}, Int64[]) with eltype Float64

10.3 Classification Performance Comparison

In [45]:
# Define methods to compare
methods = [
    ("TDA: Rich Stats (H1)", X_h1_normalized),
    ("TDA: Vectorized Diagram", X_features),
    ("PCA on Pixels (20 comp)", X_pixels_pca),
    ("Handcrafted Features", X_handcrafted_norm),
]

# Run 5-fold cross-validation for each method
comparison_results = DataFrame(
    method = String[],
    n_features = Int[],
    accuracy_mean = Float64[],
    accuracy_std = Float64[],
    accuracy_min = Float64[],
    accuracy_max = Float64[]
)

for (method_name, X) in methods
    cv_result = cross_validate(knn_clf, X, species; k=5, seed=42)

    push!(comparison_results, (
        method = method_name,
        n_features = size(X, 2),
        accuracy_mean = cv_result.mean,
        accuracy_std = cv_result.std,
        accuracy_min = minimum(cv_result.scores),
        accuracy_max = maximum(cv_result.scores)
    ))
end

# Sort by accuracy
comparison_results = sort(comparison_results, :accuracy_mean, rev=true)

println("\n=== Method Comparison Results ===\n")
println("Method                        | Features | Accuracy (Mean ± SD)      | Range")
println("-" ^ 80)
for row in eachrow(comparison_results)
    method_padded = rpad(row.method, 29)
    features_padded = lpad(string(row.n_features), 8)
    acc_mean = round(row.accuracy_mean * 100, digits=1)
    acc_std = round(row.accuracy_std * 100, digits=1)
    acc_min = round(row.accuracy_min * 100, digits=1)
    acc_max = round(row.accuracy_max * 100, digits=1)

    println("$method_padded | $features_padded | $(lpad(acc_mean, 4))% ± $(lpad(acc_std, 4))% | [$acc_min%, $acc_max%]")
end

=== Method Comparison Results ===

Method                        | Features | Accuracy (Mean ± SD)      | Range
--------------------------------------------------------------------------------
Handcrafted Features          |        8 | 48.0% ± 11.0% | [40.0%, 60.0%]
TDA: Rich Stats (H1)          |       12 | 40.0% ± 14.1% | [20.0%, 60.0%]
TDA: Vectorized Diagram       |      362 | 28.0% ± 22.8% | [0.0%, 60.0%]
PCA on Pixels (20 comp)       |       20 | 16.0% ±  8.9% | [0.0%, 20.0%]
In [46]:
# Visualization
using Plots

# Create bar plot with error bars
method_labels = comparison_results.method
accuracies = comparison_results.accuracy_mean * 100
errors = comparison_results.accuracy_std * 100

p = pt.bar(
    1:length(method_labels),
    accuracies,
    yerror=errors,
    xlabel="Method",
    ylabel="Classification Accuracy (%)",
    title="Method Comparison (5-fold CV, k=3)",
    legend=false,
    xticks=(1:length(method_labels), method_labels),
    xrotation=45,
    ylims=(0, 100),
    color=[:purple, :purple, :orange, :green],
    fillalpha=0.7,
    size=(800, 500),
    bottom_margin=10pt.mm
)

# Add horizontal line at chance level (20% for 5 classes)
pt.hline!([20.0], linestyle=:dash, color=:red, linewidth=2, label="Chance (20%)")

p

10.4 Interpretation

In [47]:
# Statistical comparison between best TDA and best alternative
best_tda = comparison_results[comparison_results.method .∈ [["TDA: Rich Stats (H1)", "TDA: Vectorized Diagram"]], :]
best_tda_row = first(sort(best_tda, :accuracy_mean, rev=true))

best_alternative = comparison_results[.!(comparison_results.method .∈ [["TDA: Rich Stats (H1)", "TDA: Vectorized Diagram"]]), :]
best_alt_row = first(sort(best_alternative, :accuracy_mean, rev=true))

println("\n=== Key Findings ===\n")
println("Best TDA method: $(best_tda_row.method)")
println("  Accuracy: $(round(best_tda_row.accuracy_mean * 100, digits=1))% ± $(round(best_tda_row.accuracy_std * 100, digits=1))%")
println()
println("Best alternative: $(best_alt_row.method)")
println("  Accuracy: $(round(best_alt_row.accuracy_mean * 100, digits=1))% ± $(round(best_alt_row.accuracy_std * 100, digits=1))%")
println()

advantage = best_tda_row.accuracy_mean - best_alt_row.accuracy_mean
println("TDA advantage: $(round(advantage * 100, digits=1)) percentage points")

if advantage > 0.05
    println("\n✓ TDA OUTPERFORMS traditional methods")
    println("  Topological features capture structure that pixel-level analysis misses")
elseif advantage < -0.05
    println("\n⚠ Traditional methods OUTPERFORM TDA")
    println("  Consider simpler approaches for this problem")
else
    println("\n≈ Methods perform SIMILARLY")
    println("  TDA provides interpretability without accuracy cost")
end

=== Key Findings ===

Best TDA method: TDA: Rich Stats (H1)
  Accuracy: 40.0% ± 14.1%

Best alternative: Handcrafted Features
  Accuracy: 48.0% ± 11.0%

TDA advantage: -8.0 percentage points

⚠ Traditional methods OUTPERFORM TDA
  Consider simpler approaches for this problem

10.5 Strengths and Weaknesses of Each Approach

Method Strengths Weaknesses Best Use Case
TDA Rich Stats • Interpretable features
• Topologically invariant
• Few features (low overfitting)
• Loses spatial info
• Requires TDA expertise
Small samples, need interpretability
TDA Vectorized • Captures full diagram
• Multi-scale information
• High-dimensional
• Harder to interpret
Large samples, complex structure
PCA on Pixels • Simple baseline
• No domain knowledge needed
• Sensitive to rotation/translation
• High-dimensional input
Quick baseline, large datasets
Handcrafted Features • Fast computation
• Domain-informed
• Requires expert feature engineering
• May miss subtle patterns
When domain knowledge available
Why This Matters for Expert Review

Comparing methods demonstrates that:

  1. TDA is not arbitrary: Empirical evidence shows whether topology adds value
  2. Interpretability vs accuracy trade-off: TDA rich stats offer interpretable features with competitive accuracy
  3. Small sample robustness: With N=25, lower-dimensional TDA features (12 dims) may generalize better than high-dimensional pixel features (10,000 dims → 20 PCA components)

This comparison strengthens the methodological contribution by showing TDA provides unique value rather than just mathematical sophistication.

11 Feature Importance

In [48]:
# Compute feature importance via permutation
importances = feature_importance_permutation(knn_clf, X_h1_normalized, species; n_repeats=20)
plot_feature_importance(importances, feature_names_h1; top_n=12, title="H1 Rich Stats Feature Importance")
In [49]:
# Show importance values
importance_df = DataFrame(
    feature = feature_names_h1,
    importance = round.(importances, digits=4)
)
sort(importance_df, :importance, rev=true)
12×2 DataFrame
Row feature importance
Symbol Float64
1 birth_range 0.098
2 std_persistence 0.082
3 q75 0.074
4 q90 0.054
5 entropy 0.05
6 median_birth 0.04
7 n_features 0.022
8 total_persistence 0.022
9 q25 0.008
10 max_persistence -0.0
11 median_persistence -0.022
12 q50 -0.044

12 Biological Interpretation

12.1 Feature Meaning

Dimension Web Structure Interpretation
H0 (components) Disconnected fragments More H0 = broken/fragmented web
H1 (loops/cycles) Closed cells/meshes More H1 = more closed cells
Entropy H1 Cell uniformity High entropy = regular cells
Max persistence H1 Largest hole/gap High = large gap in web

12.2 Drug Effects Summary

In [50]:
# Compare each drug to control
control_idx = findall(==("CONTROL"), species)
if isempty(control_idx)
    println("No CONTROL group found")
else
    control_stats = stats_h1[control_idx]
    control_entropies = [s.entropy for s in control_stats]
    control_h1_counts = [s.n_features for s in control_stats]
    control_mean_entropy = isempty(control_entropies) ? 0.0 : mean(control_entropies)
    control_mean_h1 = isempty(control_h1_counts) ? 0.0 : mean(control_h1_counts)

    println("Drug Effects Compared to CONTROL:\n")
    println("CONTROL baseline - Entropy: $(round(control_mean_entropy, digits=3)), H1 count: $(round(control_mean_h1, digits=1))\n")

    for sp in unique_species
        sp == "CONTROL" && continue

        idx = findall(==(sp), species)
        if isempty(idx)
            println("$sp: no samples\n")
            continue
        end
        drug_stats = stats_h1[idx]

        drug_entropies = [s.entropy for s in drug_stats]
        drug_h1_counts = [s.n_features for s in drug_stats]
        drug_entropy = isempty(drug_entropies) ? 0.0 : mean(drug_entropies)
        drug_h1 = isempty(drug_h1_counts) ? 0.0 : mean(drug_h1_counts)

        effects = String[]

        if drug_h1 < control_mean_h1 * 0.8
            push!(effects, "Fewer closed cells")
        elseif drug_h1 > control_mean_h1 * 1.2
            push!(effects, "More cells than control")
        end

        if drug_entropy < control_mean_entropy * 0.9
            push!(effects, "More irregular cells")
        elseif drug_entropy > control_mean_entropy * 1.1
            push!(effects, "More regular cells")
        end

        effect_str = isempty(effects) ? "Similar to control" : join(effects, ", ")

        # Safe division (avoid div by zero)
        ent_pct = control_mean_entropy == 0 ? 0.0 : (drug_entropy/control_mean_entropy - 1)*100
        h1_pct = control_mean_h1 == 0 ? 0.0 : (drug_h1/control_mean_h1 - 1)*100

        println("$sp:")
        println("  Entropy: $(round(drug_entropy, digits=3)) ($(round(ent_pct, digits=1))%)")
        println("  H1 count: $(round(drug_h1, digits=1)) ($(round(h1_pct, digits=1))%)")
        println("  Effect: $effect_str")
        println()
    end
end
Drug Effects Compared to CONTROL:

CONTROL baseline - Entropy: 4.5, H1 count: 97.6

CIPERMETRINA:
  Entropy: 4.027 (-10.5%)
  H1 count: 65.0 (-33.4%)
  Effect: Fewer closed cells, More irregular cells

ENDOSULFAN:
  Entropy: 4.072 (-9.5%)
  H1 count: 71.0 (-27.3%)
  Effect: Fewer closed cells

GLIFOSATO:
  Entropy: 3.813 (-15.3%)
  H1 count: 59.2 (-39.3%)
  Effect: Fewer closed cells, More irregular cells

SPINOSAD:
  Entropy: 4.117 (-8.5%)
  H1 count: 70.6 (-27.7%)
  Effect: Fewer closed cells

13 Enhanced Separability Analysis

This section provides rigorous statistical evidence for two key hypotheses:

  1. CONTROL is clearly separable from all drug-treated groups
  2. Drug classes are NOT easily separable from each other

13.1 Distance Combination

We combine Wasserstein distance (topological structure) with Euclidean distance (rich statistics features) to potentially improve classification.

In [51]:
# Optimize combination of Wasserstein and Euclidean distances
alpha_results = optimize_alpha(D_wasserstein, D_rich_stats, species;
                               alpha_range=0.0:0.05:1.0, k=3)

println("=== Distance Combination Optimization ===")
println("Best alpha: $(alpha_results.best.alpha)")
println("Best accuracy: $(round(alpha_results.best.accuracy * 100, digits=1))%")
println("\nInterpretation:")
println("  alpha = 1.0 means pure Wasserstein distance")
println("  alpha = 0.0 means pure Euclidean (rich stats) distance")
=== Distance Combination Optimization ===
Best alpha: 0.5
Best accuracy: 44.0%

Interpretation:
  alpha = 1.0 means pure Wasserstein distance
  alpha = 0.0 means pure Euclidean (rich stats) distance
In [52]:
plot_alpha_optimization(alpha_results)
In [53]:
# Create combined distance matrix with optimal alpha
D_combined = combine_distances(D_wasserstein, D_rich_stats; alpha=alpha_results.best.alpha)

# Compare with individual distances
acc_wass = loocv_knn_distance(D_wasserstein, species; k=3)
acc_eucl = loocv_knn_distance(D_rich_stats, species; k=3)
acc_combined = loocv_knn_distance(D_combined, species; k=3)

println("=== Classification Accuracy Comparison ===")
println("Wasserstein only:  $(round(acc_wass * 100, digits=1))%")
println("Euclidean only:    $(round(acc_eucl * 100, digits=1))%")
println("Combined (α=$(alpha_results.best.alpha)): $(round(acc_combined * 100, digits=1))%")
=== Classification Accuracy Comparison ===
Wasserstein only:  44.0%
Euclidean only:    36.0%
Combined (α=0.5): 44.0%

13.2 Binary Classification: Control vs Drug

Collapsing all drugs into a single “DRUG” class tests whether CONTROL can be clearly distinguished from treated webs.

In [54]:
binary_result = binary_classification_control_vs_rest(X_h1_normalized, species; k=3)

println("=== Binary Classification: CONTROL vs DRUG ===")
println("Accuracy: $(round(binary_result.accuracy * 100, digits=1))%")
println("95% CI: [$(round(binary_result.ci_lower * 100, digits=1))%, $(round(binary_result.ci_upper * 100, digits=1))%]")
println("Sensitivity (Control recall): $(round(binary_result.sensitivity * 100, digits=1))%")
println("Specificity (Drug recall): $(round(binary_result.specificity * 100, digits=1))%")
=== Binary Classification: CONTROL vs DRUG ===
Accuracy: 88.0%
95% CI: [75.9%, 100.0%]
Sensitivity (Control recall): 80.0%
Specificity (Drug recall): 90.0%
In [55]:
plot_binary_classification_summary(binary_result; title="Control vs Drug Classification")

13.2.1 ROC Curve Analysis

The ROC curve shows how well we can detect CONTROL samples using distance to the Control centroid.

In [56]:
roc_result = roc_curve_control(X_h1_normalized, species)
println("ROC AUC: $(round(roc_result.auc, digits=3))")
println("\nInterpretation:")
println("  AUC > 0.9: Excellent discrimination")
println("  AUC 0.8-0.9: Good discrimination")
println("  AUC 0.7-0.8: Fair discrimination")
ROC AUC: 0.955

Interpretation:
  AUC > 0.9: Excellent discrimination
  AUC 0.8-0.9: Good discrimination
  AUC 0.7-0.8: Fair discrimination
In [57]:
plot_roc_curve(roc_result)

13.3 Separability Metrics

13.3.1 Within-Class vs Between-Class Distance Ratios

A lower ratio indicates better class separation. Ratios above 0.8 suggest overlapping classes.

In [58]:
# Full 5-class analysis
wb_full = within_between_ratio(D_wasserstein, species)

# Binary grouping (Control vs Drug)
binary_labels = [l == "CONTROL" ? "CONTROL" : "DRUG" for l in species]
wb_binary = within_between_ratio(D_wasserstein, binary_labels)

# Drugs only (excluding Control)
drug_idx = findall(!=("CONTROL"), species)
wb_drugs = within_between_ratio(D_wasserstein[drug_idx, drug_idx], species[drug_idx])

println("=== Within/Between Distance Ratios ===")
println("Full 5-class:        $(round(wb_full.ratio, digits=3)) - $(wb_full.interpretation)")
println("Binary (Ctrl/Drug):  $(round(wb_binary.ratio, digits=3)) - $(wb_binary.interpretation)")
println("Drugs only (4-class): $(round(wb_drugs.ratio, digits=3)) - $(wb_drugs.interpretation)")
=== Within/Between Distance Ratios ===
Full 5-class:        0.739 - moderately separated
Binary (Ctrl/Drug):  0.607 - moderately separated
Drugs only (4-class): 0.839 - overlapping
In [59]:
plot_separability_comparison(
    ["5-class\n(all groups)", "Binary\n(Ctrl vs Drug)", "4-class\n(drugs only)"],
    [wb_full.ratio, wb_binary.ratio, wb_drugs.ratio];
    title="Class Separability"
)

13.3.2 Silhouette Score Analysis

Silhouette scores measure how well-defined each cluster is. Higher is better: - > 0.5: Good separation - 0.25-0.5: Weak separation - < 0.25: Poor separation (overlapping)

In [60]:
sil_result = silhouette_by_class(D_wasserstein, species)

println("=== Silhouette Scores by Class ===")
println("Overall mean: $(round(sil_result.overall_mean, digits=3))")
println()
for (class, stats) in sil_result.by_class
    interp = stats.mean > 0.5 ? "good" : stats.mean > 0.25 ? "weak" : "poor"
    println("$class: $(round(stats.mean, digits=3)) ($interp)")
end
=== Silhouette Scores by Class ===
Overall mean: -0.016

SPINOSAD: 0.024 (poor)
CIPERMETRINA: -0.027 (poor)
CONTROL: -0.013 (poor)
ENDOSULFAN: 0.001 (poor)
GLIFOSATO: -0.067 (poor)
In [61]:
plot_silhouette_by_class(sil_result; title="Silhouette Scores by Class")

13.3.3 Pairwise Group Distances

In [62]:
pairwise_dists = pairwise_group_distances(D_wasserstein, species)
pairwise_dists
15×5 DataFrame
Row group1 group2 mean_distance std_distance n_pairs
String String Float64 Float64 Int64
1 CIPERMETRINA CIPERMETRINA 191.607 59.6837 20
2 CIPERMETRINA CONTROL 403.084 135.293 25
3 CIPERMETRINA ENDOSULFAN 244.223 61.5255 25
4 CIPERMETRINA GLIFOSATO 264.904 90.2555 25
5 CIPERMETRINA SPINOSAD 186.761 60.8101 25
6 CONTROL CONTROL 322.777 202.577 20
7 CONTROL ENDOSULFAN 324.871 164.328 25
8 CONTROL GLIFOSATO 537.583 136.117 25
9 CONTROL SPINOSAD 409.455 147.488 25
10 ENDOSULFAN ENDOSULFAN 223.484 37.978 20
11 ENDOSULFAN GLIFOSATO 325.272 88.6281 25
12 ENDOSULFAN SPINOSAD 253.009 49.6141 25
13 GLIFOSATO GLIFOSATO 276.809 62.9035 20
14 GLIFOSATO SPINOSAD 282.145 90.0662 25
15 SPINOSAD SPINOSAD 178.751 41.1936 20
In [63]:
plot_pairwise_group_heatmap(pairwise_dists; title="Mean Pairwise Wasserstein Distances")

13.4 PERMANOVA Tests

PERMANOVA tests whether group centroids differ significantly in multivariate space. It works directly on the Wasserstein distance matrix.

13.4.1 Control vs Drugs

In [64]:
permanova_ctrl = permanova_control_vs_drugs(D_wasserstein, species; n_permutations=9999)

println("=== PERMANOVA: Control vs Drugs ===")
println("Pseudo-F: $(round(permanova_ctrl.f_statistic, digits=2))")
println("p-value: $(round(permanova_ctrl.p_value, digits=4))")
println()
if permanova_ctrl.p_value < 0.05
    println("✓ CONTROL centroid significantly differs from DRUG centroid (p < 0.05)")
else
    println("✗ No significant difference detected")
end
=== PERMANOVA: Control vs Drugs ===
Pseudo-F: 10.81
p-value: 0.0001

✓ CONTROL centroid significantly differs from DRUG centroid (p < 0.05)

13.4.2 Drug Equivalence Test

Testing whether drug groups differ from each other (excluding CONTROL).

In [65]:
drug_equiv = drug_equivalence_test(D_wasserstein, species; n_permutations=9999)

println("=== PERMANOVA: Among Drugs Only ===")
println("Pseudo-F: $(round(drug_equiv.f_statistic, digits=2))")
println("p-value: $(round(drug_equiv.p_value, digits=4))")
println()
println("Interpretation: $(drug_equiv.interpretation)")
=== PERMANOVA: Among Drugs Only ===
Pseudo-F: 3.25
p-value: 0.0001

Interpretation: Some drug differences detected

13.4.3 Pairwise Drug Comparisons

Testing each pair of drugs to see if they can be statistically distinguished.

In [66]:
# Use entropy as a key feature
drug_perm_tests = pairwise_drug_permutation_tests([s.entropy for s in stats_h1], species;
                                                    n_permutations=10000)
println("=== Pairwise Drug Permutation Tests (Entropy) ===")
drug_perm_tests
=== Pairwise Drug Permutation Tests (Entropy) ===
6×6 DataFrame
Row drug1 drug2 mean_diff p_value significant interpretation
String String Float64 Float64 Bool String
1 CIPERMETRINA ENDOSULFAN 0.0450259 0.70443 false NOT distinguishable
2 CIPERMETRINA GLIFOSATO 0.214446 0.060494 false NOT distinguishable
3 CIPERMETRINA SPINOSAD 0.089826 0.19618 false NOT distinguishable
4 ENDOSULFAN GLIFOSATO 0.259471 0.105689 false NOT distinguishable
5 ENDOSULFAN SPINOSAD 0.0448001 0.716228 false NOT distinguishable
6 GLIFOSATO SPINOSAD 0.304272 0.00939906 true distinguishable

13.5 Confusion Analysis

Which classes are most often confused with each other?

In [67]:
confusion_df = pairwise_confusion_analysis(species, result_knn_wass.predictions)

println("=== Top Confusion Pairs ===")
first(confusion_df, 10)
=== Top Confusion Pairs ===
10×4 DataFrame
Row true_class predicted_class confusion_rate count
String String Float64 Int64
1 CIPERMETRINA SPINOSAD 60.0 3
2 ENDOSULFAN CIPERMETRINA 40.0 2
3 ENDOSULFAN CONTROL 40.0 2
4 GLIFOSATO SPINOSAD 40.0 2
5 CIPERMETRINA ENDOSULFAN 20.0 1
6 CIPERMETRINA GLIFOSATO 20.0 1
7 CONTROL ENDOSULFAN 20.0 1
8 GLIFOSATO CIPERMETRINA 20.0 1
9 SPINOSAD CIPERMETRINA 20.0 1
10 CIPERMETRINA CONTROL 0.0 0

13.6 Summary: Separability Evidence

In [68]:
println("=" ^ 60)
println("SEPARABILITY ANALYSIS SUMMARY")
println("=" ^ 60)

println("\n### Evidence that CONTROL is SEPARABLE ###")
println("Binary classification accuracy: $(round(binary_result.accuracy * 100, digits=1))%")
println("ROC AUC: $(round(roc_result.auc, digits=3))")
println("PERMANOVA (Ctrl vs Drugs) p-value: $(round(permanova_ctrl.p_value, digits=4))")
println("Control silhouette score: $(round(sil_result.by_class["CONTROL"].mean, digits=3))")

ctrl_separable = binary_result.accuracy > 0.85 && permanova_ctrl.p_value < 0.05
println("\nConclusion: ", ctrl_separable ? "✓ CONTROL IS CLEARLY SEPARABLE" : "⚠ Evidence is weak")

println("\n### Evidence that DRUGS are NOT separable ###")
println("Drug-only PERMANOVA p-value: $(round(drug_equiv.p_value, digits=4))")
println("Drugs-only within/between ratio: $(round(wb_drugs.ratio, digits=3))")

# Get mean silhouette for drugs
drug_sils = [sil_result.by_class[k].mean for k in keys(sil_result.by_class) if k != "CONTROL"]
mean_drug_sil = mean(drug_sils)
println("Mean drug silhouette: $(round(mean_drug_sil, digits=3))")

drugs_not_separable = drug_equiv.p_value >= 0.05 || wb_drugs.ratio > 0.7
println("\nConclusion: ", drugs_not_separable ? "✓ DRUGS ARE NOT EASILY SEPARABLE" : "⚠ Some drug differences detected")

println("\n" * "=" ^ 60)
============================================================
SEPARABILITY ANALYSIS SUMMARY
============================================================

### Evidence that CONTROL is SEPARABLE ###
Binary classification accuracy: 88.0%
ROC AUC: 0.955
PERMANOVA (Ctrl vs Drugs) p-value: 0.0001
Control silhouette score: -0.013

Conclusion: ✓ CONTROL IS CLEARLY SEPARABLE

### Evidence that DRUGS are NOT separable ###
Drug-only PERMANOVA p-value: 0.0001
Drugs-only within/between ratio: 0.839
Mean drug silhouette: -0.017

Conclusion: ✓ DRUGS ARE NOT EASILY SEPARABLE

============================================================

14 Limitations and Future Directions

14.1 Methodological Limitations

14.1.1 Sample Size

N=5 per group is insufficient for robust statistical inference

  • Effect size estimates have very wide confidence intervals
  • High risk of Type II error (missing true effects)
  • Classification accuracy likely overestimated due to overfitting
  • Permutation tests have limited precision with small sample sizes

Impact: Results should be viewed as exploratory and hypothesis-generating, not confirmatory.

14.1.2 Parameter Selection

All preprocessing and TDA parameters were chosen heuristically without systematic optimization:

Parameter Value Used Justification
blur 2 Heuristic choice (not optimized)
threshold 0.1 Visual inspection (not data-driven)
sample_size 1000 points Computational convenience (not validated)
rips_cutoff 5 Arbitrary choice (no sensitivity analysis shown)
k (KNN) 3 Standard default (not tuned)
Wasserstein (p=1, q=2) Not compared to alternatives

Impact: Results may be sensitive to these choices. A systematic sensitivity analysis would strengthen conclusions (recommended for future work).

14.1.3 Validation Strategy

No independent validation dataset

  • All reported accuracies use cross-validation on the same 25 samples
  • High risk of overfitting to sample-specific patterns
  • True generalization performance likely lower than reported

Recommended validation hierarchy (for future studies): 1. Level 1: Internal validation (current LOOCV) 2. Level 2: Temporal validation (same cohort, different timepoints) 3. Level 3: External validation (independent lab, different spiders)

14.2 Biological Limitations

14.2.1 Uncontrolled Confounders

The original dataset lacks metadata for critical experimental variables:

  • Spider biology: Species, age, sex, size not recorded
  • Drug protocol: Dosages, exposure duration, administration method unknown
  • Environmental conditions: Temperature, humidity, light not controlled
  • Web collection: Time post-exposure unclear; web completeness varies

Impact: Observed differences could reflect confounding variables rather than drug effects alone.

14.2.2 Mechanism Unclear

TDA detects structural differences but doesn’t explain why:

  • H1 features (loops/cells) may reflect motor control, silk production, cognitive effects, or combinations
  • Different drugs with different mechanisms show similar topological signatures
  • Requires neurobiology/toxicology expertise for causal interpretation

14.2.3 Generalizability Unknown

  • Results specific to one spider species (unidentified in dataset)
  • Drug effects may vary across species, life stages, dosages
  • Environmental context (lab vs. field) not specified
  • Replication on independent datasets essential

14.3 Statistical Concerns

14.3.1 Multiple Testing

  • 12+ statistical tests conducted without strict family-wise error rate control
  • Bonferroni correction mentioned (α = 0.004) but not consistently applied
  • With small N, correction further reduces power

Approach taken: Report raw p-values; prioritize effect sizes and consistency across tests

14.3.2 Cross-Validation Variance

  • LOOCV on N=25 has high variance
  • K-fold CV (k=5) results show wide standard deviations
  • Single train/test split would be even less reliable given small N

14.3.3 Distance Metric Choice

  • Wasserstein distance chosen arbitrarily
  • No comparison to Bottleneck distance or other metrics
  • Different metrics may yield different classification results

14.4 What This Study IS and IS NOT

14.4.1 ✓ What This Study IS

  1. Proof-of-concept demonstrating TDA’s applicability to toxicological screening
  2. Methodological contribution showing topological features capture web structure
  3. Hypothesis-generating exploratory analysis identifying promising features
  4. Reproducible pipeline with documented code and methods

14.4.2 ✗ What This Study is NOT

  1. NOT definitive biological conclusions about drug effects
  2. NOT generalizable beyond this specific dataset
  3. NOT adequately powered for detecting small-to-medium effects
  4. NOT validated on independent data

14.5 Strengths Despite Limitations

Despite small sample size and methodological constraints, this work demonstrates:

  1. Novel application: First TDA analysis of drug-induced spider webs
  2. Clear CONTROL separation: Binary classification (88% accuracy) suggests drugs do affect web topology
  3. Reproducible methods: All code and analysis fully documented
  4. Transparent limitations: Honest acknowledgment of constraints builds trust

14.6 Future Directions

14.6.1 Immediate Improvements (Next Study)

  1. Increase sample size: Target N ≥ 20 per group for adequate power
  2. Independent validation: Collect hold-out test set (70/30 train/test split)
  3. Systematic parameter tuning: Grid search with nested cross-validation
  4. Controlled experiments: Standardize spider species, age, drug dosage, exposure time

14.6.2 Advanced Methodology

  1. Comparison to baselines: Test against CNN classifiers and traditional image features
  2. Multi-scale analysis: Vary filtration parameters systematically
  3. Statistical topology methods: Use recent advances for inference on persistence diagrams
  4. Temporal dynamics: If possible, track same spiders building multiple webs

14.6.3 Biological Integration

  1. Dose-response curves: Test multiple drug concentrations
  2. Behavioral correlates: Link topological features to specific motor/cognitive deficits
  3. Multi-species comparison: Test generalizability across spider families
  4. Mechanism investigation: Use neurobiology techniques to validate TDA findings

14.7 Conclusions with Appropriate Caveats

This exploratory analysis demonstrates that:

  1. TDA captures drug effects: CONTROL webs are topologically distinguishable from drug-treated webs
  2. H1 features are informative: Entropy, cycle count, and persistence metrics vary systematically
  3. Drug classes overlap: Different drugs produce similar topological signatures, suggesting common disruption pathways
  4. Method shows promise: TDA provides interpretable, geometrically-motivated features

However, with N=5 per group and no independent validation, these findings require: - Replication on larger, independent datasets - Systematic parameter validation - Controlled experimental conditions - Biological mechanism investigation

This work is best viewed as methodological proof-of-concept rather than definitive toxicological findings.